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Abstract 

In  this  paper  we  describe  a  new  efficient  algorithm  for  recognizing  3D  objects  by  combining  photometric 
and  geometric  invariants.  Some  photometric  properties  are  derived,  that  are  invariant  to  the  changes 
of  illumination  and  to  relative  object  motion  with  respect  to  the  camera  and/or  the  lighting  source  in 
3D  space.  We  argue  that  conventional  color  constancy  algorithms  can  not  be  used  in  the  recognition  of 
3D  objects.  Further  we  show  recognition  does  not  require  a  full  constancy  of  colors,  rather,  it  only  needs 
something  that  remains  unchanged  under  the  varying  light  conditions  and  poses  of  the  objects.  Combining 
the  derived  color  invariants  and  the  spatial  constraints  on  the  object  surfaces,  we  identify  corresponding 
positions  in  the  model  and  the  data  space  coordinates,  using  centroid  invariance  of  corresponding  groups 
of  feature  positions.  Tests  are  given  to  show  the  stability  and  efficiency  of  our  approach  to  3D  object 
recognition. 
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1  Introduction 

A  typical 

approach  to  model-based  object  recognition [14]  matches 
stored  geometric  models  against  features  extracted  from 
an  image,  where  the  features  are  typically  localized  ge¬ 
ometric  events,  such  as  vertices.  Objects  are  considered 
to  have  undergone  a  transformation  in  space  to  yield 
a  novel  view  for  the  image.  To  solve  for  this  transfor¬ 
mation  explicitly,  recognition  methods  use  matches  of 
features  to  hypothesize  a  transformation,  which  is  used 
to  align  the  model  with  the  image  and  select  the  best-fit 
pair  of  transformation  and  model.  While  this  approach 
to  recognition  has  achieved  considerable  success,  there 
still  remain  practical  problems  to  be  solved. 

One  such  problem  is  the  computational  complexity  of 
the  method.  For  example,  even  with  popular  algorithms 
(e.g.[20,  32]),  to  recognize  an  object  with  m  features  from 
an  image  with  n  features,  we  must  examine  com¬ 

binations  of  hypotheses  where  m  and  n  can  be  easily  on 
the  order  of  several  hundreds  in  natural  pictures.  A  sec¬ 
ond  problem  is  the  tolerance  of  the  algorithm  to  scene 
clutter.  To  verify  the  hypothesized  transformation,  ob¬ 
ject  recognition  algorithms  have  to  collect  evidence  of 
actual  correspondences  characterized  by  that  transfor¬ 
mation.  This  is  usually  done  by  looking  for  nearest  im¬ 
age  features  around  the  transformed  model  features,  or 
equivalently  by  casting  votes  to  a  hash  table  of  param¬ 
eters,  such  as  affine  invariant  parameters,  leading  to  a 
correspondence  (e.g.[24]).  In  either  case,  when  features 
are  extracted  from  the  image  with  perturbations,  and  if 
the  image  is  cluttered  so  that  the  feature  distribution  is 
too  dense,  it  is  difficult  to  tell  whether  an  image  feature 
thus  detected  is  the  one  actually  corresponding  to  the 
model  feature  or  if  it  just  happened  to  fall  close  to  the 
transformed  model  feature.  This  issue  has  been  exten¬ 
sively  analyzed,  both  theoretically  and  empirically,  giv¬ 
ing  arguments  about  the  limitations  of  geometric  feature 
based  approaches  to  recognition  (e.g.[15,  1,  14]). 

Considering  the  limitations  of  conventional  ap¬ 
proaches  to  recognition  which  depend  solely  on  geomet¬ 
rical  features,  it  is  natural  to  start  using  other  cues 
than  simple  local  geometric  features.  One  such  candi¬ 
date  is  photometric  information  like  color,  because  we 
know  that  color  often  characterizes  objects  well  and  it 
is  almost  invariant  to  the  change  of  views  and  lighting 
conditions.  In  parallel  with  geometry,  color  properties 
of  the  object  surface  should  be  a  strong  key  to  the  per¬ 
ception  of  the  surface.  However,  most  authors  who  have 
exploited  color  in  recognition  used  it  simply  for  segmen¬ 
tation,  e.g.,  [5,  29,  16],  mostly  because  color  is  considered 
to  be  more  contributive  in  building  up  salient  features 
on  the  object  surface  than  in  giving  precise  information 
on  the  location  and  the  poses  of  the  objects.  Exceptions 
include  Swain [27,  28]  and  Nayar  et.  al.  [26]  who  have 
used  photometric  information  more  directly  for  recogni¬ 
tion,  respectively  for  indexing  and  matching  processes. 
At  the  same  time,  however,  they  abandoned  the  use  of 
local  geometric  features,  which  still  is  very  useful  in  pre¬ 
dicting  the  locations  and  the  poses  of  the  objects.  Swain 
used  only  a  color  histogram  for  representing  objects  and 
matched  it  over  the  image  to  identify  the  object  included 


and  localize  its  presence  in  the  image.  Nayar  et  al.  pro¬ 
posed  a  photometric  invariant  for  matching  regions  with 
consistent  colors  given  the  partitioned  model  and  im¬ 
age  derived  by  some  other  color  properties.  Therefore, 
it  requires  a  preliminary  segmentation  of  the  image  into 
regions  having  consistent  colors. 

In  this  paper,  we  attempt  to  exploit  both  geometric 
and  photometric  cues  to  recognize  3D  objects,  by  com¬ 
bining  them  more  tightly.  Our  goal  is  to  develop  an 
efficient  and  reliable  algorithm  for  recognition  by  tak¬ 
ing  advantage  of  the  merits  of  both  geometric  and  color 
cues:  the  ability  of  color  to  generate  larger  and  thus 
more  salient  features  reliably,  as  well  as  of  adding  more 
selectivity  to  features,  which  enables  more  efficient  and 
reliable  object  recognition,  and  the  rich  information  car¬ 
ried  by  the  set  of  local  geometric  features  that  is  useful  in 
accurately  recovering  the  transformation  that  generated 
the  image  from  the  model.  To  realize  this,  we  have  devel¬ 
oped  new  photometric  invariants  which  are  suitable  for 
this  approach.  Then,  we  combine  the  proposed  photo¬ 
metric  properties  with  the  Centroid  Alignment  approach 
of  corresponding  geometric  feature  groups  in  the  model 
and  the  image,  that  we  have  recently  proposed  [25].  This 
strategy  gives  an  efficient  and  reliable  algorithm  for  rec¬ 
ognizing  3D  objects.  In  our  testing,  it  took  only  0.2 
seconds  to  derive  corresponding  positions  in  the  model 
and  the  image  for  natural  pictures. 

2  Some  photometric  invariants 

In  this  section,  we  develop  some  photometric  invariants 
that  can  be  used  as  strong  cues  in  the  recognition  of 
3D  objects.  The  invariant  is  related  to  the  notion  of 
color  constancy,  that  is  —  whether  in  human  or  ma¬ 
chine  vision  —  the  perceptual  ability  to  determine  the 
surface  reflectance  property  of  the  target  objects  given 
the  reflected  light  from  the  object  surface  in  the  recep¬ 
tive  field.  If  a  color  constancy  algorithm  could  perform 
sufficiently  well,  we  could  use  it  for  object  recognition  be¬ 
cause  it  would  provide  a  unique  property  of  the  object 
itself.  Unfortunately,  however,  color  constancy  is  gener¬ 
ally  difficult  to  compute  in  practice,  so  we  can  not  use  it 
by  itself.  The  invariant  property  to  be  presented  here  is 
efficiently  computed  from  the  segmented/non-segmented 
images  at  the  same  time  as  the  geometrical  features  are 
extracted. 

2.1  Unavailability  of  color  constancy 

Color  constancy  is  an  underconstrained  problem,  as  we 
will  see  in  the  following.  Let  5(x,  A)  be  the  spectral 
reflectance  function  of  the  object  surface  at  x,  that  is  the 
property  we  have  to  recover,  let  jE'(x,  A)  be  the  spectral 
power  distribution  of  the  ambient  light,  and  let  RkW  be 
the  spectral  sensitivity  of  the  ^th  sensor,  then  /?fc(x),  the 
scalar  response  of  the  kth.  sensor  channel  to  be  observed, 
is  described  as 

Pk(x)  =  J  Six,  X)Eix,  X)RkiX)dX  (1) 

where,  generally,  S'  is  a  function  describing  geometric 
and  spectral  properties  of  the  surface  at  x  that  can  be 
an  arbitrary  function  and  E  could  also  be  an  arbitrary 


function  of  x  and  A.  The  integral  is  taken  over  the  visible 
spectrum(usually  from  380  to  800  nm).  The  geometric 
factor  of  the  object  surface,  that  is  usually  considered  to 
include  the  surface  normal  and  the  relative  angle  of  the 
incident  and  reflecting  light  direction  with  respect  to  the 
surface  normal,  is  very  crucial  in  the  3D  world[18].  In 
addition,  there  are  also  other  confounding  factors  such  as 
specularities  and  mutual  reflections  on  the  surface.  With 
these  complexities,  to  perform  color  constancy,  that  is  to 
recover  5(x,  A),  we  need  to  limit  the  world  to  which  it  is 
applied.  To  get  a  simple  intuition  of  this,  for  example, 
we  might  insert  an  arbitrary  scalar  function  (7(x)  in  (1) 
so  that  we  have[33], 

^,(x)  -  J {5(x.  A)C'(x)}{£;(x,  X)/C{x)}Rk{X)dX.  (2) 

Clearly,  when  S  with  E  is  a>  solution  for  (1),  S'  —  SC 
with  E'  =  EfC  is  also  a  solution  for  any  function  C. 
To  turn  this  into  a  well-posed  problem,  almost  all  au¬ 
thors  have  addressed  problems  in  a  strongly  constrained 
world  like  Mondarian  space  [19,  13,  33,  31,  12,  9]:  a 
2D  space  composed  of  several  matte  patches  overlap¬ 
ping  each  other.  Then,  based  on  the  observation  that 
both  the  ambient  light  and  the  surface  reflectance  for 
planar  surfaces  can  be  approximated  by  linear  combina¬ 
tions  of  a  small  number  of  fixed  basis  functions[7,  21], 
they  can  deal  with  the  problem  at  a  fairly  feasible 
level[13,  33,  31,  12,  10,  9].  A  good  mathematical  analysis 
is  given  in  [10].  However,  all  of  those  results  are  for  a  2D 
world.  This  two-dimensionality  assumption  takes  away 
any  chance  of  conventional  color  constancy  being  used  in 
recognizing  a  3D  world.  Therefore,  we  can  not  employ 
conventional  color  constancy  algorithms  as  presented. 

2.2  Some  color  invariants 

Knowing  that  color  constancy  is  not  easily  attainable  for 
any  plausible  3D  world,  we  propose  a  photometric  invari¬ 
ant  property  for  use  in  the  recognition  of  3D  objects. 

Since  it  is  known  that  a  spectrum  distribution  of 
the  surface  reflectance  of  many  materials  depends  very 
little  on  the  surface  geometry [23],  we  may  break  up 
the  surface  reflectance  function  into  the  product  of  ge¬ 
ometry  G(x)  and  spectrum  property  T(x,  A)  such  that 
S'(x,  A)  =  G(x)T(x,  A).  Then,  the  equation  (1)  becomes: 


Taking  the  ratios  between  the  two  i,j  channel  re¬ 
sponses  eliminates  the  geometric  factor  G(x)  which  de¬ 
pends  on  the  relative  orientation  of  the  object  surface 
with  respect  to  the  camera  and/or  the  lighting  source, 

a(x)  ^  J  Lix,X)EiX)RiiX)dX 

p^(x)  jLix,X)E{X)RjiX)dX  ^  ^ 

By  the  same  reasoning,  we  have  a  similar  form  af¬ 
ter  the  motion  of  the  object  with  respect  to  the  camera 
and/or  the  lighting  source, 

^  fL'(x',X)E'(X)Ri(X)dX 
p'j(x')  fL'{x',X)E’(X)Rj(X)dX  ^  ’ 

where  primes  show  the  function  after  the  motion,  and 
this  prime  notation  applies  to  any  symbol  expressing 
some  quantity  after  the  motion  of  the  object  in  the 
rest  of  this  paper  unless  otherwise  described.  Note  that 
T(x,  A)  —  Z'(x',A),  because  the  spectrum  property  of 
the  surface  reflectance  would  not  be  affected  by  the  ob¬ 
ject  motion.  When  we  approximate  the  spectral  ab¬ 
sorption  functions  R  by  narrow  band  filters  such  that 
Ri{\)  ^  Si6{Xi  —  A),  where  Si  is  the  channel  sensitivity 
and  the  Aj  is  the  peak  of  the  spectral  sensitivity  of  the 
ith  channel,  we  obtain  ratios  from  (5)  and  (6): 


Tij  (x)  = 


7^(x')  = 


a(x) 

Pj(x) 

SjLjx,  Xi)E{Xi) 
SjL(x,Xj)E{Xj) 

pm. 

Pj{^') 

SiL{x,  Xi)E'(Xi) 
-^(x,  Xj  )E'{Xj ) 


(7) 

(8) 


Since  the  band  width  over  which  a  real  camera  sensor  re¬ 
sponds  varies  from  camera  to  camera,  and  the  standard 
ones  may  not  be  too  narrow,  this  is  only  an  approxima¬ 
tion.  However,  experiments  show  that  this  assumption  is 
not  unrealistic  for  the  normal  cameras.  Taking  the  ratio 
of  7’s  before  and  after  the  motion  and/or  the  change  of 
lighting  conditions  yields. 


Pk{x)  =  J  G{x)L{x,X)E(x,X)Rk{X)dX 

=  G(x)  J  L{x,X)E{x,X)Rk{X)dX  (3) 

[Constant  ambient  light  assumption  over  the  en¬ 
tire  surface] 

If  we  assume  that  the  ambient  light  spectrum  distri¬ 
bution  is  constant  over  the  entire  surface  of  the  objects, 
E  becomes  simply  a  function  of  wavelength  A.  This  as¬ 
sumption  is  justified  when  the  lighting  source  is  suffi¬ 
ciently  far  away  from  the  object  relative  to  the  size  of 
the  object  surface,  and  mutual  illumination  and  shadow¬ 
ing  are  not  significant.  This  yields 

Pkix)  =  G(x)  /  L{x,  X)E(X)RkiX)dX  (4) 


where 


Tu(x) 

7^(x0 


.E{Xi),„E'iXi), 

^EiXj)^^^E'iXj)^ 


(9) 


(10) 


Since  €ij  is  apparently  independent  of  the  position 
on  the  surface,  7ij(x)  can  be  regarded  as  approximately 
invariant  to  the  changes  of  illuminant  conditions  and  to 
the  motions  of  the  object  within  a  consistent  scale  factor 
over  the  object  surface.  Note  that  tij  depends  only  on 
the  ratios  of  spectrum  distribution  of  the  ambient  light 
before  and  after  the  motion  of  the  object. 

In  using  7  for  object  recognition,  we  might  need  to 
normalize  its  distribution  because  generally  it  is  invari¬ 
ant  only  within  a  scale  factor.  When  we  are  provided 


□ 

□ 


BiSt 


[Avail  and. 


with  the  sets  of  7  from  corresponding  positions  over  dif¬ 
ferent  views,  this  could  be  done  by  applying  a  normal¬ 
ization  process  to  the  original  sets: 

_  i 

(11) 

where  cr^j  is  the  variance  of  the  given  jij  distribution. 
Note  that  when  the  ambient  light  has  not  been  changed, 
dj  =  1,  so  that  7zj(x)  =  Tij(x'),  thus  normalization 
process  is  not  needed. 

[Only  locally  constant  ambient  light  assumption] 

Now,  let  us  assume  only  a  locally  constant  ambient 
light  spectrum  distribution,  instead  of  the  globally  con¬ 
stant  one  over  the  object  surface:  ^^(x/,A)  — 
for  nearby  positions  x/,Xyn.  Then,  eqs.  (7)  and  (8)  must 
be  modified  respectively  as: 


7ii(x)  = 


7,7  (x)  = 


P«(x) 

P;(x) 

SiL{x,  Xi)E{Tc,Xi) 
SjL{x,Xj)E{x,\j) 

pM. 

SiL{x,Xi)E'{x',Xi) 

SjL{x,Xj)E'{x',Xj) 


(12) 


(13) 


Incorporating  the  assumption,  that  is,  E{xi,X)  — 
E{xm,X),  and  £''(x'(,A)  =  E'(x'm,X),  we  again  have 
an  invariant  V’iJ* : 


.Im  -  7u'(xi) 

Tij(Xm) 

^  f  E{xi ,  Aj) ^  I  L(x  m ) 


(14) 


thus,  apparently,  V*!”  »  However,  is  obviously 

sensitive  to  perturbations  contained  in  the  image  signals 
especially  when  one  makes  the  values  of  (x^t^  )  (the 
denominator  in  (14))  close  to  zero.  To  stabilize  this,  we 
adopt  a  normalized  measure  in  place  of  ‘ijj  itself: 


^  7»j(x;) 

7ij(x,„)  +  7y(x;) 


(15) 


It  is  easy  to  see  9?  «  that  is,  (p  is  approximately  in¬ 
variant  to  the  change  of  illumination  conditions  and  of 
orientations  of  the  object  surfaces. 

Note  that  for  jij  we  can  not  derive  this  kind  of  nor¬ 
malized  invariant  formula.  A  very  important  thing  to 
remember  here  is  that  in  order  to  make  p  useful,  the 
surface  reflectance  properties  associated  with  two  nearby 
positions  x/,x^  to  be  picked  up  must  be  sufficiently  dif¬ 
ferent  from  each  other.  Otherwise,  even  if  an  invariant 
of  in  (15)  holds  true,  as  the  7’s  tend  to  have  the  same 
value  for  x^jX^,  the  p^s  always  return  values  that  are 
close  to  0.5,  so  that  it  does  not  provide  any  useful  infor¬ 
mation  involved  in  their  color  properties.  Fortunately, 
as  we  describe  later  when  color  properties  are  picked  up 
from  different  sides  of  the  brightness  boundaries,  this 
situation  may  often  be  avoided. 


2.3  Related  photometric  invariants 

A  related  invariant  to  our  photometric  invariants  was 
proposed  earlier  based  on  an  opponent  color  model  by 
Faugeras  for  image  processing  applications [8].  The  op¬ 
ponent  color  model  was  first  introduced  by  Hering[17]  to 
describe  the  mechanism  of  human  color  sensation.  He 
advocated  that  the  three  pairs  Red-Green,  Blue- Yellow, 
White-Black  form  the  basis  of  human  color  perception. 
A  simple  mathematical  formulation  of  this[3],  which  is 
a  linear  transformation  of  R,  G,  B  was  used  as  a  color 
invariant  in  [27,  28]  for  indexing  3D  objects:  [R-G,B1- 
Y,W-Bk]^  =  L[R,G,B]^,  where  L  is  a  linear  transfor¬ 
mation.  A  similar  formalization  of  an  opponent  color 
model  was  also  used  for  the  correspondence  process  in 
color  stereopsis  [5].  However,  there  are  no  theoretical  ex¬ 
planations  of  the  linear  transformation  model  for  the  full 
3D  object  surfaces,  because,  as  we  noted  in  the  deriva¬ 
tion  of  our  invariants,  the  surface  orientation  in  3D  space 
with  respect  to  the  lighting  source  and  the  camera  is  an 
unignorable  factor  (see  also  [18])  in  deriving  invariants 
for  a  3D  world,  and  it  is  never  removed  by  any  linear 
transformation. 

Unlike  this  linear  transformation  case,  Faugeras’s 
form  is  the  logarithm  of  the  ratios  between  different 
channel  responses  for  a  chromatic  model,  so  is  similar 
to  ours,  and  the  logarithm  of  the  products  of  three  of 
Rj  G,  B  responses  but  with  a  low-pass  filtering  account¬ 
ing  for  lateral  inhibition  for  achromatic  responses. 

In  [4]  a  unique  illuminant- invariant  was  proposed 
which,  assuming  the  existence  of  at  least  four  local  dis¬ 
tinct  color  surfaces,  uses  the  volumetric  ratio  invariant 
of  the  parallelepiped  generated  by  the  responses  of  the 
three  receptors.  It  seems  to  us,  however,  that  the  as¬ 
sumption  of  four  local  distinct  color  surfaces  is  demand¬ 
ing  too  much  in  practice. 

Recently,  a  new  photometric  invariant  was  proposed 
for  object  recognition[26].  Limiting  its  application  to 
only  geometrically  continuous  smooth  surfaces,  it  used 
as  an  invariant  the  ratio  between  the  brightnesses  of  two 
adjacent  regions  each  with  consistent  and  different  sur¬ 
face  spectral  reflectance.  Therefore,  it  requires  a  prelim¬ 
inary  complete  segmentation  of  the  image  into  regions 
having  the  same  colors.  Other  assumptions  introduced 
in  its  derivation  are  almost  the  same  to  ours  (locally  con¬ 
stant  ambient  illuminant  case)  except  for  the  additional 
continuous  smooth  surface  constraint  over  the  boundary 
of  two  surfaces  with  different  spectral  reflectance. 

2.4  Experiments 

Experiments  were  conducted  to  examine  the  accuracy 
of  the  proposed  photometric  invariants.  Figure  1  shows 
pictures  of  a  man-made  convex  polyhedron  composed 
of  6  planar  surfaces  each  with  a  different  surface  ori¬ 
entation.  The  left  picture  is  a  front  view  of  the  poly¬ 
hedron,  hereafter  pose  Pa,  while  in  the  right  picture 
the  object  is  rotated  around  the  vertical  axis  (y-axis) 
by  about  30  degrees,  hereafter  pose  On  each  side  of 
the  boundary  of  adjacent  surfaces,  several  matte  patches 
with  different  colors  were  pasted.  Then,  we  picked  up 
corresponding  positions  manually  within  each  colored 
patch  in  the  pictures  for  the  poses  (P^j^b)-  The  se- 


lected  positions  within  patches  are  depicted  by  crosses 
in  the  pictures.  To  test  the  accuracy  of  the  proposed 
invariants  7,  under  varying  illuminant  conditions  and 
surface  orientations  of  the  object  with  respect  to  the  il¬ 
luminant  and  the  camera,  we  took  three  pictures:  the 
first  at  the  pose  Pa  under  the  usual  lighting  conditions 
(Pa^Lu),  the  second  at  the  pose  Pb  under  a  green¬ 
ish  light  and  the  third  at  the  pose  Pb  but 

under  a  bluish  light  To  change  the  source 

light  spectrum,  i.e.,  to  get  greenish  or  bluish  light,  we 
covered  a  tungsten  halogen  lamp  with  cellophane  of  col¬ 
ors  green  and  blue.  For  yp,  the  surface  positions  within 
planar  patches  facing  over  the  boundaries  of  planar  sur¬ 
faces  were  used  as  neighboring  positions  to  satisfy  the 
requirement  of  (locally)  constant  ambient  light.  To 
compute  the  invariants  in  practice,  we  used  the  ratios 
G/R,  B/R  for  7  and  (pi  =  {G^ / R^) / {G^ / R^  +  G^fR^)^ 
(p2  =  {B^/R^)/{B^/R^  +  B^^/R^)  for  (p,  where  R,G,B 
are  the  outputs  from  the  sensor  channels  respectively  of 
Red,  Green,  Blue,  and  the  indices  attached  to  i2,  G,  B 
shows  the  sides  of  the  surfaces  used  for  computing  <p^s 
with  respect  to  their  boundaries.  As  described  previ¬ 
ously,  in  theory,  when  we  use  the  RGB  channel  outputs 
to  compute  invariants,  instead  of  outputs  through  the 
exact  narrow  band  filters,  they  might  be  only  pseudo¬ 
invariants.  But,  the  following  results  confirm  that  the 
values  of  7  and  (p  computed  using  RGB  are  fairly  in¬ 
variant  to  the  changes  of  the  illumination  conditions  as 
well  as  the  surface  orientations.  In  Table  1,  the  cor¬ 
relation  coefficients  between  the  sets  of  values  for  each 
invariant  measure  computed  at  corresponding  positions 
in  different  pictures  are  given,  that  are  measured  by  the 
following  formula: 


G qiCxG ex'  ex' 


(16) 


where  Gat’s  (a,  6  E  {a,  a'})  are  the  covariances  between 
the  sets  of  the  values  of  the  measure  a  (e.g.,  7)  before  (a) 
and  after  (a^)  the  motion  of  the  objects  or  the  changes 
of  the  lighting  conditions,  which  is  defined  by: 

C7„6  =  ^P(a,6)(a-a)(6-6)  (17) 

where  x  is  the  average  of  the  measure  x,  P(a,  6)  is  the 
probability  density  function,  and  the  sum  is  taken  over 
all  corresponding  values  of  the  measures  a,  6.  A  high 
correlation,  that  gives  a  value  close  to  1,  shows  that 
the  proposed  invariant  measures  remained  unchanged 
within  a  consistent  scale  over  the  set  of  positions  be¬ 
tween  the  two  picture,  while  a  low  correlation,  that  gives 
a  value  close  to  0,  means  that  the  values  of  the  mea¬ 
sures  changed  in  a  irregular  manner.  For  comparison, 
other  color  properties  including  raw  (i^,  G,  jB),  (iif,  5,  V) 
(hue, saturation, value),  and  a  linear-transformation  im¬ 
plementation  of  the  opponent  color  model[3]  are  also 
included.  In  these  tests,  iJ,  G,5,  R  —  G,  B  —  Y y  7  = 
G/Ry  B/Ry  are  almost  equally  good,  though  7  is  the  best 
among  them  on  average,  that  mean  those  properties  have 
been  changed  but  only  within  a  consistent  scale  between 
the  different  pictures  (recall  the  property  of  7  being  in¬ 
variant  within  a  scale  factor).  The  reason  why  R  is  very 


good  is  probably  just  that  we  did  not  happen  to  change 
the  intensity  of  the  red  light  spectrum.  The  values  of 
Hy  Sy  V  is  unexpectedly  quite  unstable.  The  measure  p 
is  extremely  stable.  To  see  how  far  the  color  properties 
remained  unchanged  in  addition  to  the  correlative  rela¬ 
tion,  in  Figure  2  the  actual  distribution  of  the  color  prop¬ 
erties  are  shown,  where  the  horizontal  axes  are  the  values 
for  the  pose  Pa  ,  while  the  vertical  axes  are  those  for  the 
pose  Pb  .  If  the  color  measures  remained  unchanged  be¬ 
tween  the  two  pictures  before  and  after  the  motions  of 
the  object  and/or  the  changes  of  the  light  conditions, 
the  distributions  should  present  linear  shapes,  and  their 
slopes  should  be  close  to  1.  Indeed,  the  measure  p  is 
certainly  found  to  remain  almost  unchanged  under  vary¬ 
ing  light  conditions,  while  other  color  properties  Hy  5, 
and  7  =  G/Ry  B/R  included  for  comparison  are  found 
not.  The  biases  of  the  slopes  of  7  either  toward  the  hor¬ 
izontal  or  vertical  axes  indicate  that  the  light  spectrum 
has  been  changed  between  the  two  compared  pictures. 
Figure  3  shows  the  performance  of  7  constancy  against 
the  change  of  the  object  pose,  under  the  same  lighting 
conditions.  In  other  words,  unlike  in  the  last  experi¬ 
ments,  this  time  the  ambient  light  has  not  been  changed 
for  both  of  the  two  pictures,  and  only  the  object  pose 
has  been  changed.  For  comparison,  the  performance  of 
B  —  Y  (linear-trans  implementation  for  blue  vs.  yellow, 
the  second  figure  from  the  left)  as  well  as  raw  B  (blue, 
the  first  one)  are  also  shown.  Note  that  what  should 
be  observed  here  is  how  the  slopes  of  the  distributions 
are  close  to  1.  Except  for  the  two  samples  in  the  upper 
area  in  the  figure  (the  fourth  picture),  7  =  B/R\s  found 
to  be  almost  unchanged  between  the  two  pictures.  The 
two  exceptional  samples  were  from  patches  with  almost 
saturated  blue  channel  in  the  picture  at  pose  Pb-  The 
performance  of  7  =  G/ R  (the  third  figure)  is  almost  per¬ 
fect.  On  the  other  hand,  B  —  Y  and  B  are  perturbed 
around  the  slope  of  1,  which  is  probably  caused  by  the 
perturbed  orientations  of  the  patches.  This  suggests  that 
7  may  be  used  for  object  recognition  without  applying 
any  normalization  process,  so  that  extracting  object  re¬ 
gions  might  not  be  a  prerequisite,  as  long  as  the  lighting 
conditions  are  not  changed. 

Similarly,  in  Table  2  the  results  of  the  same  tests  as 
above  but  on  a  natural  object,  a  doll  which  is  shown 
in  Figure  4,  are  given,  for  which  both  the  ambient  light 
and  the  object  pose  were  changed.  We  refer  to  the  pose 
of  the  doll  similarly  to  the  above  tests  on  the  Test- 
Object:  left  pose  Pa,  right  pose  Pb-  The  first  pic¬ 
ture  was  taken  under  a  usual  lighting  conditions  from 
the  oblique  3ing\e{PA^Lu),  the  second  and  third  were 
taken  respectively  under  a  greenish  and  a  bluish  light 
from  the  front  angle(Pjg&LG,  Pb^Lb)-  Correspond¬ 
ing  positions  were  picked  up  manually  as  done  in  the 
previous  tests.  As  the  surface  colors  varied  smoothly, 
we  can  not  expect  that  we  could  pick  up  correspond¬ 
ing  points  accurately.  Thus,  unwanted  errors  could  be 
introduced  in  this  operation.  This  time  for  p,  two  posi¬ 
tions  which  are  closest  to  each  other  among  the  selected 
points  are  used.  In  this  tests,  P,  G,  P  and  5,  V  per¬ 
formed  poorly,  though  somehow  H  was  very  good.  The 
linear  model  R~GyB~Y  and  7  =  G/Ry  P/P  performed 


well  again,  though  7  was  better.  The  measure  ip  is  quite 
stable  again.  Unlike  the  results  on  the  Test-Object,  how¬ 
ever,  since  the  surface  of  the  doll,  especially  in  the  body 
parts,  had  similar  surface  colors  in  near  positions,  the 
distribution  oi  ip  —  p\  “  {G^  j B}) j j B}  +  jB?)^ 
P2  —  / B}) / B^ / B?)  —  did  not  spread  very 

well,  thus  having  a  weak  selectivity  photometrically,  as 
seen  in  Figure  5.  Therefore,  when  picking  up  two  nearby 
positions  for  p  for  object  recognition,  it  is  important  that 
they  have  different  spectral  reflectance.  For  comparison, 
the  values  of  i/,  5,  and  7  are  also  plotted  in  Figure  5. 

2.5  Sensing  limitations 

As  we  note  in  the  examination  above,  the  invariant  prop¬ 
erties  are  sometimes  perturbed  around  the  ideal  values 
which  support  our  theories.  This  is  caused  mainly  by 
the  limited  dynamic  range  of  the  sensors  of  the  cam¬ 
era.  These  effects  include  Color  Clipping  and  Blooming 
as  argued  carefully  in  [23].  When  the  incident  light  is 
too  strong  and  exceeds  the  dynamic  range  of  the  sen¬ 
sor,  the  sensor  can  not  respond  to  that  much  input  and 
thus  clips  the  upper  level  beyond  the  range.  This  means 
the  sensor  does  not  correctly  reflect  the  intensity  of  the 
light  any  more.  Note  that  this  is  very  serious  for  our  in¬ 
variants,  because  both  7  and  p  are  ratio  invariant,  and 
a  basis  of  their  theory  is,  whether  locally  or  globally, 
the  consistency  of  the  amount  of  light  falling  onto  the 
concerning  positions  on  the  object  surfaces.  Here,  our 
natural  and  important  assumption  is  that  this  consis¬ 
tency  is  correctly  reflected  in  the  responses  of  the  sen¬ 
sors.  Therefore,  if  the  sensor  response  does  not  meet 
this  assumption,  our  theory  no  longer  holds.  The  same 
arguments  also  hold  for  the  blooming  effect.  When  the 
incoming  light  is  too  strong  to  be  received  by  the  sensor 
element  of  the  CCD  camera,  the  overloaded  charge  will 
travel  to  the  nearby  pixels,  thus  crippling  the  responses 
of  such  pixels. 

3  Combining  photometric  and 

geometric  constraints  for  3D  object 
recognition 

In  this  section,  we  describe  how  we  can  exploit  the  pho¬ 
tometric  invariant  developed  in  the  preceding  section  for 
recognizing  3D  objects.  The  basic  idea  is  to  combine  it 
with  the  Centroid  Alignment  approach  we  have  recently 
proposed  in  [25]. 

3.1  Centroid  invariant  of  geometric  feature 
groups 

We  argued  in  [25]  that  when  an  object  undergoes  a  lin¬ 
ear  transformation  caused  by  its  motion,  the  centroid  of 
a  group  of  3D  surface  points  is  transformed  by  the  same 
linear  transformation.  Thus,  it  was  shown  that  under  an 
orthographic  projection  model,  centroids  of  2D  image  ge¬ 
ometric  features  always  correspond  over  different  views 
regardless  of  the  pose  of  the  object  in  space.  This  is  true 
for  any  object  surfaces  (without  self-occlusion).  Note 
that  this  property  is  very  useful,  because  if  we  have  some 
way  to  obtain  corresponding  feature  groups  over  different 
views,  we  can  replace  simple  local  features  used  for  defin¬ 


ing  alignment  in  conventional  methods  by  those  groups, 
thereby  reducing  computational  cost.  We  demonstrated 
the  effectiveness  of  this  approach  to  object  recognition 
on  natural  as  well  as  simulation  data  [25] . 

3.2  Grouping  by  photometric  and  geometric 
constraints 

To  obtain  corresponding  groups  of  2D  geometric  fea¬ 
tures,  we  can  use  the  proposed  photometric  invariant 
measures  associated  with  each  feature. 

In  [25],  to  obtain  corresponding  geometric  feature 
groups,  a  clustering  operation,  in  which  the  criterion 
was  rotationally  invariant,  was  applied  in  the  coordinates 
which  had  been  normalized  up  to  a  rotation  prior  to  a 
clustering.  This  time,  we  again  use  a  clustering  tech¬ 
nique  to  obtain  corresponding  geometric  feature  groups 
in  different  views.  Our  intention  is  to  yield  correspond¬ 
ing  cluster  configurations  using  a  criterion  incorporat¬ 
ing  spatial  proximity  constraints  of  geometric  features 
and  the  invariance  of  their  associated  photometric  in¬ 
variants  we  have  proposed.  Therefore,  we  assume  that 
surface  colors  (surface  spectral  reflectance)  vary  mostly 
from  place  to  place.  In  other  words,  within  some  local 
areas  surface  colors  are  almost  consistent.  Note  that  this 
assumption  should  be  justified  for  most  object  surfaces, 
because  otherwise  we  must  always  be  seeing  diffused  col¬ 
ors  over  the  surface  and  thus  always  having  difficulty  in 
trying  to  distinguish  surfaces.  We  also  normalize  the  ge¬ 
ometric  feature  distributions  by  the  linear  transforma¬ 
tion  we  presented  in  [25].  This  transformation  has  been 
confirmed,  both  mathematically  and  empirically,  to  gen¬ 
erate  a  unique  distribution  up  to  a  rotation,  for  feature 
sets  from  a  planar  surface  on  the  object,  regardless  of 
the  surface  orientations  in  3D  space.  We  note  that  even 
3D  object  surfaces  often  tend  to  become  planar  in  their 
visible  surfaces,  thus  justifying  the  use  of  our  transfor¬ 
mation  for  3D  object  surface.  This  will  be  seen  later  in 
the  experiments. 

3.3  Implementation 

We  employ  the  Kmean  clustering  algorithm,  in  which  the 
criterion  is  rotationally  invariant,  to  obtain  correspond¬ 
ing  feature  groups  in  the  feature  set  from  different  views. 
The  feature  vector  /  used  in  clustering  is  the  extended 
feature  (from  local  geometrical  feature)  which  is  defined 
by  the  following  vector: 

f=[fj,sfjf  (18) 

where  fg  is  the  2D  geometric  feature  composed  of  spa¬ 
tial  coordinates  fg  =  ((c,y)^  of  a  feature  point  in  the 
xy  image  plane,  and  fp  is  the  vector  of  photometric  in¬ 
variant  properties  we  proposed  in  the  preceding  sections, 
and  5  is  a  balancing  parameter.  Note  that  what  we  ul¬ 
timately  need  here  is  simply  the  configuration  of  geo¬ 
metric  features,  that  is  /^,  in  the  clustering  results,  and 
photometric  invariant  is  used  only  as  a  cue  in  performing 
clustering. 

After  the  clustering,  an  alignment  process  starts  by 
using  centroids  of  clusters  so  derived  to  recover  the  trans¬ 
formation  which  generated  a  novel  view,  the  image  data, 
from  the  model.  It  is  known  that  only  3  point  corre¬ 
spondences  suffice  to  recover  the  transformation  either 


by  using  Linear  Combination  of  the  models [32]  or  a  full 
3D  object  model[20].  Therefore,  we  examine  every  pos¬ 
sible  combination  of  triples  of  cluster  centroids  of  model 
and  data  that  are  generated  by  clustering,  and  select  the 
best-fit  transformation  to  generate  the  data  from  the 
model  in  terms  of  their  match.  In  our  testing,  which 
we  will  see  later,  this  number  of  clusters  could  be  sup¬ 
pressed  to  less  than  10.  Further,  we  should  note  that 
we  only  need  to  consider  the  combination  of  model  and 
data  cluster  centroids  which  have  compatible  values  of 
7  or  This  means  that  adding  photometric  properties 
contribute  not  only  to  the  clustering  but  also  to  the  selec¬ 
tivity  of  the  features  (cluster  centroids).  Therefore,  con¬ 
sidering  the  computational  complexity  of  conventional 
alignment  approach  to  recognition,  this  should  bring  a 
noticeable  computational  improvement. 

4  Empirical  results 

In  this  section,  we  show  experimental  results  of  our  algo¬ 
rithm  for  identifying  corresponding  positions  in  different 
views.  Tests  were  conducted  on  natural  pictures  includ¬ 
ing  3D  objects  to  be  recognized. 

4.1  Preliminaries 

Geometric  features  used  for  our  algorithm  can  be  ex¬ 
tracted  as  follows: 

(Step  1)  Use  an  edge  detector[6]  after  preliminary 
smoothing  to  obtain  edge  points  from  the  original  gray 
level  images. 

(Step  2)  Link  individual  edge  points  to  form  edge  curve 
contours. 

(Step  3)  Using  local  curvatures  along  the  contours,  iden¬ 
tify  features  as  corners  and  inflection  points  respectively 
by  detecting  high  curvature  points  and  zero  crossings 
based  on  the  method  described  in  [20].  Before  actually 
detecting  such  features,  we  smooth  the  curvatures  along 
the  curves  [2]. 

In  obtaining  color  attributes  from  corresponding  posi¬ 
tions  we  should  note  that  the  positions  of  the  geometric 
features  thus  extracted  in  different  views  do  not  always 
correspond  exactly  in  discrete  image  coordinate  space. 
This  is  not  only  due  to  quantization  error,  but  also  be¬ 
cause  edges  detected  to  derive  feature  points  can  shift  to 
the  other  side  of  the  surface  beyond  the  boundary  under 
a  object  rotation  within  a  image  plane.  Note  that  this 
is  serious  because  the  occurrences  of  gray  level  edges  of¬ 
ten  tend  to  coincide  with  color  edges [5].  So,  we  can  not 
simply  use  the  color  attributes  of  the  geometrical  feature 
points  derived  from  gray  level  edges.  To  solve  this  prob¬ 
lem,  we  picked  up  color  values  from  two  positions  over 
the  gray  level  boundary,  which  are  away  from  the  geo¬ 
metric  feature  positions  in  the  opposite  directions  along 
the  local  normals  of  the  contours.  Then,  we  used  two 
color  values  from  both  of  two  positions.  As  we  do  not 
know  which  sides  of  an  edge  in  one  picture  correspond  to 
which  in  another,  the  distance  metric  between  the  pho¬ 
tometric  invariant  vectors  associated  to  two  different  fea¬ 
ture  positions  should  be  independent  of  the  correspon¬ 
dences  of  those  sides  of  the  surfaces.  Thus,  the  actual 
measure  used  for  photometric  invariant  vector  fp  and  the 
distance  metric  for  two  of  those  (that  are  used  for  com¬ 


puting  the  values  for  clustering  criterion)  are  designed 
such  that  they  support  the  symmetry  on  the  sides  of  the 
surfaces  over  the  boundaries:  fp  =  [/p^,/p^]^)  where 
/;  z.  {G^/R\  B^/W)  for  7  and  /;  -  {{G^ / R^)/{Gy T 
G{/R{),{G{/R{)/{G[/R\  +  G{/R{),{B^/R^)/{B^/R^  + 
B^ /R^),  {B^ /R^)/{B^/R^  +  B^ /R^))  for  (p,  and  indices 
(i,  j)  G  {(1,2),  (2, 1)}  show  the  sides  of  the  surfaces  with 
respect  to  their  boundaries,  and  the  distance  metric  be¬ 
tween  fpi  and  fp2  for  geometric  feature  positions  1,  2 
is: 

\fpl  —  /p2p  =  ™^{||/pl  ”  /p2lP  Wfpl  ~~  /p2lpj 

ii/p\-/p^2ip+ii/p\-/;2in  (19) 

where  ||  •  ||  denotes  Euclidean  distance.  This  appar¬ 
ently  supports  the  symmetry  on  the  sides  of  the  surfaces 
over  the  boundaries  of  the  gray  level,  and  is  invariant  to 
the  rotation  of  the  objects  within  a  image  plane.  The 
following  experiments  test  our  algorithm  with  both  of 
the  proposed  invariants  7,  (p.  For  each  feature  position, 
the  associated  invariant  (p  was  computed  using  color  at¬ 
tributes  of  those  two  points  mentioned  above,  that  is, 
two  points  a  little  away  from  the  geometrical  feature 
points  along  the  contour  normals  in  the  opposite  direc¬ 
tions.  As  described  earlier,  since  gray  level  edges  tend  to 
coincide  with  color  edges,  the  color  values  collected  from 
those  two  positions  facing  across  the  gray  level  edges 
are  usually  quite  different,  thereby  producing  (p  distri¬ 
butions  that  spread  over  the  feature  space.  To  satisfy 
the  requirement  for  7,  that  is  to  be  provided  with  the 
corresponding  sets  of  points  between  the  model  and  the 
data  views,  the  object  regions  were  extracted  prior  to 
the  application  of  our  algorithm.  This  was  done  manu¬ 
ally  though  we  expect  that  this  could  be  done  automat¬ 
ically  using  several  cues  such  as  motion,  color,  texture, 
(see  e.g.,[30,  29,  27,  28].)  Note  that,  however,  in  using 
<p  this  process,  i.e.,  region  extraction,  is  not  necessarily 
required,  as  long  as  the  background  in  the  picture  hap¬ 
pened  to  have  different  colors  from  object  ones.  This  is 
because  9?  is  a  complete  invariant,  unlike  7  which  needs 
further  normalization  to  remove  scale  factors  as  we  have 
argued.  This  is  also  true  for  7  when  the  ambient  light 
has  not  been  changed  before  and  after  the  motion  of  the 
objects.  Hereafter,  we  refer  to  7,  the  normalized  mea¬ 
sure,  as  simply  7. 

4.2  Experiments 

We  tested  our  algorithm  to  see  how  accurately  it  can 
identify  corresponding  positions  over  different  pictures 
taken  under  varying  light  conditions  and  poses  of  the 
objects  to  be  recognized.  It  would  not  be  hard  to 
see  that  identifying  corresponding  positions  perfectly  is 
not  an  easy  task,  because  in  doing  that  we  must  fight 
against  two  different  kind  of  instabilities:  one  in  ex¬ 
tracting  geometric  features,  most  serious  one  of  which 
is  the  missing  of  features,  and  the  other  substantially 
contained  in  photometric  properties  of  the  image,  such 
as  the  ones  described  in  the  arguments  for  sensing  limi¬ 
tations.  Remember  that,  however,  for  our  ultimate  ob¬ 
jective,  that  is  recognizing  objects  using  the  identified 
positions,  only  three  correspondences  are  sufficient  un- 


der  orthographic  projection  model[32]  or  weak  perspec¬ 
tive  projection  model[20].  Therefore,  what  have  to  be 
observed  in  the  following  results  are  whether  our  algo¬ 
rithm  could  identify  at  least  this  minimum  number  of 
correspondences  or  not.  First,  the  results  of  using  7  as 
photometric  invariant  are  shown. 

[With  7  for  photometric  invariant] 

Figure  6  shows  the  results  of  obtaining  feature  group  cen¬ 
troids  on  Band-Aid-Box  pictures,  which  includes  char¬ 
acters  of  some  different  colors  on  a  white  base  on  the 
surface.  All  the  pictures  were  taken  to  involve  the  same 
three  surfaces  of  the  box,  which  are  to  be  used  for  the 
recognition.  The  figures  in  the  first  row  from  the  top 
show  the  edge  maps  with  extracted  geometric  features 
superimposed  on  them  with  small  closed  circles.  The 
first  from  the  left  (hereafter  first)  picture  was  taken 
under  a  usual  light  conditions.  The  second  from  the 
left  (hereafter  second)  and  third  from  the  left  (hereafter 
third)  pictures  were  taken  respectively  under  a  greenish 
and  a  bluish  light  at  a  different  pose  from  the  first  one. 
Throughout  the  rest  of  the  paper,  we  refer  to  the  figures 
by  the  order  they  are  presented  from  the  left  as  above. 
The  lighting  conditions  were  changed  by  the  same  way 
used  in  the  experiments  presented  in  section  2.4.  The  fig¬ 
ures  in  the  second  and  the  third  rows  show  the  respective 
original  and  normalized  distributions  of  7.  The  horizon¬ 
tal  axes  of  the  figures  are  for  G/R  while  the  vertical  axes 
are  for  B/R.  These  figures  show  how  the  invariant  prop¬ 
erty  7  remained  unchanged  between  the  different  pic¬ 
tures.  When  it  performs  well,  the  original  distributions 
of  7  should  show  the  similar  shape  over  different  views 
except  for  some  scale  change  along  the  axes.  Then,  those 
scale  distortion  (e.g.,  dilation)  should  be  corrected  by 
the  normalization  of  the  distribution,  thus  ideally  show¬ 
ing  linear  distributions  of  slop  1.  Note  that  even  if  the 
shape  of  the  distributions  are  distorted  in  addition  to  the 
dilation,  we  can  not  conclude  that  the  proposed  invari¬ 
ants  performed  poorly.  This  is  because  unstable  results 
of  the  geometrical  feature  extraction  will  also  distort  the 
shape  of  the  distribution  of  the  photometric  properties. 
The  intermediate  results  of  clustering  are  shown  in  the 
fourth  row  in  their  normalized  coordinate  of  the  geo¬ 
metric  features.  In  the  figures  of  the  first  row,  identified 
corresponding  positions  using  our  algorithm  are  super¬ 
imposed  by  large  closed  circles.  Therein,  the  accuracy 
of  our  algorithm  are  found  to  be  fairly  good.  Appar¬ 
ently  perturbations  of  identified  positions  were  caused 
partly  by  the  unstable  results  of  feature  extraction,  e.g., 
missing  features,  rather  than  by  clustering  errors  or  in¬ 
completeness  of  the  proposed  photometric  invariant. 

In  Figures  7  results  on  Spaghetti-Box  pictures  taken 
in  the  same  way  as  the  Band- Aid-Box  pictures  are  given. 
The  surfaces  of  this  box  include  some  textures  including 
large/small  characters.  This  is  a  little  cluttered  texture 
compared  with  the  Band- Aid-Box  surface.  The  first  row 
shows  the  edges  with  extracted  geometric  features  su¬ 
perimposed  on  them.  The  first  picture  was  taken  un¬ 
der  a  usual  light  condition.  The  second  and  the  third 
pictures  were  taken  respectively  under  a  greenish  and  a 
bluish  light  at  different  poses.  The  second  and  the  third 


row  figures  show  the  respective  original  and  normalized 
distribution  of  7.  The  algorithm  could  perform  identi¬ 
fication  of  the  corresponding  positions  fairly  accurately 
as  we  see  in  the  top  figures. 

Similarly,  in  Figure  8  the  results  on  Doll  (the  same  one 
as  the  one  used  in  the  section  2.4)  pictures  are  presented. 
Unlike  the  last  two  example,  the  surface  of  this  doll  does 
not  have  man-made  texture  such  as  characters,  but  only 
has  color /brightness  changes  partly  due  to  the  changes 
of  materials  and  partly  due  to  depth  variations.  The 
surface  is  mostly  smooth  except  for  some  parts  includ¬ 
ing  hair,  face,  and  finger  parts.  The  pictures  in  the  first 
row  show  the  edges  with  extracted  geometric  features 
superimposed  on  them.  The  first  and  second  pictures 
were  taken  under  a  usual  light  conditions,  but  at  differ¬ 
ent  poses  of  the  doll.  The  third  picture  was  taken  under 
a  moderate  greenish  light  plus  usual  room  light.  For 
the  fourth  picture,  we  used  an  extremely  strong  tung¬ 
sten  halogen  lamp  with  a  bluish  cellophane  covering  it. 
The  second  and  the  third  row  figures  show  the  respective 
original  and  normalized  distributions  of  7.  Comparing 
the  shapes  of  original  and  normalized  distributions  of  7 
for  the  first  and  the  second  pictures,  we  can  confirm  that 
when  the  light  conditions  have  not  been  changed  the  dis¬ 
tributions  of  7  are  not  affected  by  the  change  of  pose  of 
the  object.  The  algorithm  could  perform  identification 
of  the  corresponding  positions  fairly  accurately  as  we  see 
in  the  pictures. 

[With  (p  for  photometric  invariant] 

The  results  of  using  ^  as  a  photometric  invariant  on  the 
same  pictures  used  for  7  are  shown.  Figure  9  presents  the 
results  on  Band-Aid-Box  pictures.  The  first  row  shows 
the  edge  maps  with  extracted  geometric  features  super¬ 
imposed  on  them  with  closed  circles.  In  the  second  row, 
respective  distributions  of  (p  are  shown.  The  horizon¬ 
tal  axes  are  for  {G^ /R^)/{G^ /R\-h  G^/R^),  while  the 
vertical  axes  are  for  / R^)/(B^ /R^  -h  B^  / R^)  where 

{hi)  €  {(1,  2),  (2, 1)}.  As  described  already,  since  we  do 
not  know  the  correspondences  of  the  sides  of  the  sur¬ 
face  over  the  edges  (contours),  we  included  properties 
from  both  sides  of  the  edges.  Consequently,  we  had  2- 
fold  symmetric  distributions  of  p  around  its  centroid  as 
noted  in  the  second  row  figures  (see  eq.  (15)).  When 
p  performs  well  as  an  invariant,  this  distribution  should 
remain  unchanged  over  different  pictures.  Thus,  the  sec¬ 
ond  row  figures  demonstrate  a  fairly  good  performance 
of  it  for  this  picture.  The  intermediate  results  of  clus¬ 
tering  are  given  in  the  third  row  figures  in  their  normal¬ 
ized  coordinate  of  the  geometric  features.  In  the  figures 
of  the  first  row,  identified  corresponding  positions  using 
our  algorithm  are  also  superimposed  by  large  closed  cir¬ 
cles.  Thus,  the  accuracy  of  our  algorithm  are  found  to 
be  fairly  good. 

In  Figures  10  the  results  with  p  on  Spaghetti-Box  are 
given.  The  first  row  shows  the  extracted  geometric  fea¬ 
tures.  The  second  row  shows  the  distributions  of  p.  The 
performance  of  p  is  almost  perfect.  As  we  see  in  the  pic¬ 
tures,  the  algorithm  with  p  could  perform  identification 
of  the  corresponding  positions  very  well. 

Figure  11  presents  the  results  on  Doll  pictures.  In 


the  first  row,  the  edge  maps  with  extracted  geometric 
features  superimposed  on  them  are  shown.  The  second 
row  shows  the  the  respective  distributions  of  Since 
for  the  fourth  picture  we  used  extremely  intensive  blue 
light,  the  blue  channel  of  many  pixels  were  saturated.  As 
a  consequence,  the  distribution  of  (p  was  shrunk  in  the 
vertical  direction  as  noted  in  the  fourth  picture  of  the 
second  row.  For  these  doll  pictures,  generally,  the  results 
of  identifying  corresponding  positions  with  (f  were  not 
as  good  as  those  with  7,  though  not  very  bad.  This  is 
probably  because  as  the  surface  colors  of  the  doll  varies 
quite  smoothly  in  most  parts,  the  distribution  of  p  did 
not  spread  well,  so  that  it  did  not  work  so  well  as  to 
separate  clusters  in  terms  of  colors. 

5  Discussions  and  conclusion 

We  argued  that  by  combining  the  proposed  photometric 
invariants  with  geometric  constraints  tightly,  we  can  re¬ 
alize  very  efficient  and  reliable  recognition  of  3D  objects. 
Specifically,  we  conducted  the  experiments  of  identify¬ 
ing  the  corresponding  feature  positions  over  the  differ¬ 
ent  views  taken  under  different  conditions.  Although  we 
did  not  include  the  demonstrations  of  the  actual  recog¬ 
nition  process,  as  described,  by  connecting  the  presented 
method  for  identifying  features  using  photometric  invari¬ 
ants  with  the  popular  recognition  algorithms,  such  as 
the  full  3D  model  method[20]  or  the  Linear  Combina¬ 
tion  of  the  model [32],  we  can  perform  object  recognition 
quite  efficiently.  This  may  be  demonstrated  somewhere. 
In  the  experiments,  we  showed  that  our  methods  could 
tolerate  perturbations  both  in  color  and  geometric  prop¬ 
erties,  and  could  provide  at  least  minimum  number  of 
correspondences  of  positions  necessary  for  object  recog¬ 
nitions.  Although  we  extracted  the  object  regions  man¬ 
ually  in  the  experiments  this  is  sometimes  easily  done 
from  sequences  of  images,  from  the  simple  background, 
or  may  be  performed  by  using  color  segmentations.  In 
addition,  we  stress  again  that  as  long  as  the  background 
has  different  colors  from  the  object  ones,  we  can  use 
(p  without  any  preliminary  processing  for  region  extrac¬ 
tion.  This  also  holds  true  for  7  when  the  ambient  light 
has  remained  unchanged.  The  weakness  of  p  comes  out 
when  the  discontinuities  of  gray  level  do  not  coincide 
with  the  ones  of  colors.  In  this  case,  the  distribution  of 
p  does  not  spread  very  well.  This  emerged  in  the  body 
parts  of  the  doll.  Compared  with  the  conventional  ap¬ 
proaches  of  matching  local  features  of  which  the  number 
is  of  the  order  of  several  hundreds,  the  computational 
cost  of  our  approach  for  recognizing  3D  objects  should 
be  very  small.  The  time  for  identifying  (about  10)  cor¬ 
responding  feature  positions,  i.e.,  cluster  centroids,  was 
around  0.2  sec  for  pictures  with  several  hundreds  fea¬ 
tures.  In  addition,  we  can  use  the  invariant  photometric 
values  in  searching  for  the  correspondences  between  the 
derived  feature  points  in  the  model  and  the  image,  so 
that  needless  searches  could  be  further  suppressed. 

The  advantages  of  our  approach  compared  with  Na- 
yar’s  are  as  follows.  Their  method  uses  invariant  photo¬ 
metric  properties  derived  for  regions  each  with  a  consis¬ 
tent  and  different  color,  so  that  the  color  segmentation 
is  a  prerequisite.  In  our  view,  this  color  segmentation 


is  an  essential  process  to  reduce  the  size  of  the  search 
space  for  correspondences,  and  the  photometric  invari¬ 
ant  was  used  only  for  further  limiting  possible  matches 
between  the  model  and  the  data  regions.  Unfortunately, 
however,  achieving  complete  color  segmentation  is  often 
quite  hard  and  time  consuming [2 9].  Of  course,  it  can 
still  contribute  to  reduce  the  computational  cost,  since 
in  general  the  number  of  color  regions  included  in  the  en¬ 
tire  image  could  still  be  on  the  order  of  some  tens.  But, 
it  appears  to  be  less  of  a  contribution  than  color  segmen¬ 
tation  to  the  reduction  of  computational  cost.  Contrary 
to  their  approach,  since  our  photometric  invariant  can  be 
computed  only  locally,  we  do  not  necessarily  need  color 
segmentation  as  mentioned  above,  so  is  less  demanding. 
In  addition,  since  the  color  properties  are  passed  to  the 
following  clustering  plus  feature  centroid  alignment  pro¬ 
cess,  our  method  can  tolerate  many  confounding  fac¬ 
tors,  such  as  inaccuracies  of  region  and/or  feature  ex¬ 
traction,  happening  in  the  application  to  the  real  world. 
The  clustering  plus  feature  centroid  alignment  process  is 
very  suitable  for  compensating  those  uncertainties.  We 
should  also  point  out  that,  to  be  theoretical,  region  cen¬ 
troids  which  they  used  for  matching  can  not  be  used  for 
3D  surfaces,  while  our  feature  centroids  can. 

An  alternative  way  of  using  the  proposed  photomet¬ 
ric  invariant  in  recognition  is  just  to  incorporate  it  into 
the  conventional  framework  of  recognition.  For  exam¬ 
ple,  in  selecting  features  to  form  hypothesized  corre¬ 
sponding  triples  of  features  between  the  mode  and  the 
data,  photometric  properties  can  be  used  to  limit  the 
possible  matches  between  the  model  and  the  data  fea¬ 
tures,  trimming  a  bunch  of  needless  combinations  in  the 
search  space,  thereby  effectively  reducing  the  computa¬ 
tional  cost.  This  kind  of  idea  has  been  used  in  [26]  for 
matching  corresponding  regions. 
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Figure  1:  Tests  of  Invariant  on  Convex  Polyhedron 

The  pictures  show  the  convex  polyhedron  in  different  poses:  left  pose  Pa,  right  pose  Pb-  This  object  is  composed  of  6  planar 
surface  patches  each  with  different  surface  orientation.  On  each  side  of  the  boundary  of  adjacent  surfaces,  several  matte 
patches  with  different  colors  were  pasted.  Then,  we  picked  up  corresponding  positions  manually  within  each  colored  patch  in 
both  pictures.  The  selected  positions  within  patches  are  depicted  by  crosses. 


Pa^Lu-Pb^Lg 

Pa^Lu-Pb^Lb 

R 

0.988368 

0.989877 

G 

0.967951 

0.974081 

B 

0.946251 

0.882816 

H 

0.724681 

0.701377 

S 

0.914236 

0.749529 

V 

0.945473 

0.668672 

R-G 

0.985398 

0.985687 

B-Y 

0.935039 

0.908867 

GfR 

0.978163 

0.988289 

B/R 

0.962186 

0.907126 

VI  =  iG^IR^)/(G^/R^  +GygY 

0.997766 

0.997532 

V2  =  {B^/R^)liB^IR‘  +B^IR^) 

0.991843 

0.988893 

Table  1:  Correlation  coefficients  between  the  sets  of  the  values  of  the  color  properties  from  different  pictures  of 
Test- Object. 

The  correlation  coefficients  between  the  sets  of  values  of  the  proposed  invariants  from  pictures  taken  under  different  light 
conditions  and  at  the  different  poses  of  the  object  are  given  to  show  how  much  they  remain  unchanged  within  a  consistent 
scale.  For  comparison,  other  color  properties  including  (R,  G,  B),  (H ^  S,V),  and  a  linear-trans  implementation  of  opponent 
color  model[3]  are  also  presented.  In  these  tests,  {R,  G,  B),  {R- G,  B  -  T),  7  =  {G/R,  B/R),  are  almost  equally  good,  though 
7  is  best  among  them.  The  reason  why  R  is  also  fine  is  probably  just  that  we  did  not  happen  to  change  the  intensity  of  the  red 
light  spectrum.  The  values  of  {H,S,  V)  (hue, saturation, value)  is  unexpectedly  unstable.  The  measure  ip  is  extremely  stable. 
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Figure  2:  Distributions  of  invariants  on  Convex  Polyhedron 
The  left  two  columns  are  from  pictures  taken  under  Pa^Lu  (horizontal  axis)  and  Pb^Lq  (vertical  axis),  and  the  right  two 
columns  are  from  pictures  under  Pa^Lu  (horizontal  axis)  and  Pb^Lb  (vertical  axis).  The  rows  in  each  two  columns  are 
respectively:  top  left  and  right:  H  and  5,  middle  left  and  right:  G/R  and  bottom  left  and  right: 

=  {G^jR^)l{G^IR}  +  G^IR^)  and  ^2  =  ! R^)I{B^ j R^  +  B'^IR?), 
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Figure  3:  Tests  of  7  at  different  poses  of  object  but  under  the  same  illuminant  conditions 
The  first  from  the  left:  distribution  of  Blue,  the  second  :  B  —  Y  (Blue  vs.  Yellow),  the  third:  G/R,  the  fourth:  Bj R.  The 
horizontal  axis  is  for  the  pose  Pa  and  the  vertical  axis  is  for  the  pose  Pb’  Except  for  the  two  samples  in  the  upper  right  area 
of  the  distribution,  7  =  BfR  is,  found  to  be  almost  unchanged  in  both  of  the  pictures  because  the  slope  is  almost  1,  while 
^  y"  anJ  are  perturbed  around  the  slope  of  1.  Those  two  exceptional  samples  were  from  patches  with  almost  saturated 
blue  channel  in  the  picture  at  pose  Pb>  The  distribution  of  7  =  GjR  is  almost  perfect.  This  gives  the  evidence  that  7  may 
be  used  for  object  recognition  without  applying  any  normalization  process,  so  that  extracting  object  regions  might  not  be  a 
prerequisite,  as  long  as  the  lighting  conditions  are  not  changed. 


Figure  4:  Tests  of  Invariant  on  natural  pictures 

The  pictures  show  a  doll  at  different  poses:  left  pose  A,  right  pose  B.  We  picked  up  corresponding  positions  in  both  views. 
The  selected  positions  are  depicted  by  crosses. 
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Pa^Lu-Pb^Lg 

Pa^Lxj-Pb^Lb 

p 

0.764343 

0.819267 

G 

0.588161 

0.881416 

B 

0.936572 

0.843604 

H 

0.951843 

0.923887 

S 

0.934587 

0.490994 

V 

0.398850 

0.459425 

P-G 

0.764240 

0.939152 

B-Y 

0.948642 

0.877519 

G/P 

0.779377 

0.944164 

BjP 

0.962186 

0.895180 

VPi  =  (G7i?^)/(GViZ^  +G7iJ") 

0.996245 

0.998781 

~W^WW)JWWTW1W)~ 

0.988840 

0.983675 

Table  2:  Correlation  coefficients  between  the  sets  of  the  values  of  the  color  properties  from  different  pictures  of  the 
Doll. 

The  results  on  natural  object,  a  doll,  are  given.  The  first  picture  was  taken  under  a  usual  lighting  conditions  from  the  oblique 
angle(Pyi&jLa),  the  second  and  third  were  taken  respectively  under  a  greenish  and  a  bluish  light  from  the  front  angle(PB^TG5 
Pb^Lb)^  This  time  for  (p  —  =  {G^ IR^)I{G^ / P}  +  G'^jP?),  j B})j{B^  j P}  +  j P?)  —  two  positions  which 

are  closest  to  each  other  are  used.  In  this  tests,  P,  G,  B  and  if,  5,  V  were  very  unstable.  The  linear  model  P  ~  G,  B  —  Y, 
^  =  G/P,  B/P  did  perform  well  again,  though  7  was  better.  The  measure  p  is  quite  stable  again. 
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Figure  5:  The  distributions  of  invariant  measures  on  Doll  pictures. 

The  left  two  columns  are  from  pictures  taken  under  Pa^Lu  (horizontal  axis)  and  Pb^Lg  (vertical  axis),  and  the  right  two 
columns  are  from  pictures  under  Pa^Lu  (horizontal  axis)  and  Pb^Lb  (vertical  axis).  The  rows  in  each  two  columns  are 
respectively:  top  left  and  right:  H  and  S',  middle  left  and  right:  GJR  and  B / bottom  left  and  right: 

=  {G^IR^)I{G^IR}  +  G^IR^)  and  ^2  =  / R^)I{B^ ! R^  ^  / R^), 
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Figure  6:  Tests  with  7  on  Band-Aid-Box  picture. 

Edge  maps  are  shown  with  extracted  geometric  features  superimposed  on  them  in  the  first  row.  The  first  picture  (from  the 
left)  was  taken  under  a  usual  light  conditions.  The  second  and  third  pictures  were  taken  respectively  under  a  greenish  and  a 
bluish  light  at  a  different  pose.  Identified  corresponding  positions  using  our  algorithm  are  also  superimposed  by  large  closed 
circles.  The  figures  in  the  second  and  third  rows  show  the  respective  original  and  normalized  distributions  of  7.  The 
intermediate  results  of  clustering  are  shown  in  the  fourth  row  figures  in  their  normalized  coordinate  of  the  geometric 
features. 
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Figure  7:  Tests  with  7  on  Spaghetti-Box  pictures 

The  surface  of  this  boxes  include  some  colored  textures  including  large/small  characters.  The  pictures  in  the  first  row  show 
the  edges  with  extracted  geometric  features  superimposed  on  it.  The  first  picture  (from  the  left)  was  taken  under  a  usual 
light  conditions.  The  second  and  third  pictures  were  taken  respectively  under  a  greenish  and  a  bluish  light  at  a  different  pose 
from  the  first  one.  The  second  and  third  rows  show  the  respective  original  and  normalized  distributions  of  7.  The  identified 
positions  are  depicted  by  large  closed  circles  in  the  figures  of  the  first  row.  The  algorithm  could  perform  identification  of  the 
corresponding  positions  fairly  accurately  as  we  see  in  the  upper  figures. 
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Figure  8:  Tests  with  7  on  Doll  pictures 

The  surface  of  this  doll  does  not  have  man-made  texture  like  characters,  but  only  has  color/brightness  variation  partly  due 
to  the  changes  of  materials  and  partly  due  to  depth  variations.  The  surface  is  mostly  smooth  except  for  some  parts  including 
hairs,  face,  and  finger  parts.  The  first  row  shows  the  edge  maps  with  the  extracted  geometrical  features  superimposed  on  it 
with  small  closed  circles.  The  first  and  second  pictures  (from  the  left)  were  taken  under  a  usual  light  conditions,  but  at 
different  poses  of  the  doll.  The  third  picture  was  taken  under  a  moderate  greenish  light  plus  usual  room  light.  For  the  fourth 
picture,  we  used  a  extremely  strong  tungsten  halogen  lamp  with  a  bluish  cellophane  covering  it.  The  second  and  the  third 
rows  show  the  respective  original  and  normalized  distributions  of  7.  The  identified  positions  are  depicted  by  large  closed 
circles  in  the  figures  of  the  first  row.  The  algorithm  could  perform  identification  of  the  corresponding  positions  fairly 
accurately  as  we  see  in  the  figures. 


Figure  9:  Tests  with  (p  on  Band- Aid-Box  pictures 

The  pictures  in  the  upper  row  show  the  edge  maps  with  extracted  geometric  features  superimposed  on  them.  The  first 
picture  (from  the  left)  was  taken  under  a  usual  light  conditions.  The  second  and  third  pictures  were  taken  respectively  under 
a  greenish  and  a  bluish  light  at  a  different  pose  from  the  first  one.  The  second  row  figures  show  the  respective  distributions 
of  ip.  The  third  row  figures  show  the  intermediate  results  of  the  clustering.  The  identified  positions  are  depicted  by  large 
closed  circles  in  the  figures  in  the  first  row. 
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Figure  10:  Tests  with  ip  on  Spaghetti-Box  pictures 

The  surface  of  this  box  include  some  colored  textures  including  large/small  characters.  Upper  pictures  show  the  edges  with 
extracted  geometric  features  superimposed  on  it.  The  first  picture  was  taken  under  a  usual  light  conditions.  The  second  and 
third  pictures  were  taken  respectively  under  a  greenish  and  a  bluish  light  and  at  a  different  pose.  The  lower  figures  show  the 
respective  distributions  of  p.  The  identified  positions  are  depicted  by  large  closed  circles  in  the  figures  of  the  upper  row.  The 
algorithm  could  perform  identification  of  the  corresponding  positions  fairly  accurately  as  we  see  in  the  upper  figures. 
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Figure  11:  Tests  with  ip  on  Doll  pictures 

The  surface  of  this  doll  does  not  have  man-made  texture  like  characters,  but  only  has  color/brightness  variation  due  to  the 
change  of  material.  The  surface  is  mostly  smooth  except  for  some  parts  including  hairs,  face,  and  finger  parts.  The  pictures 
in  the  upper  row  show  the  edges  with  extracted  geometric  features  superimposed  on  it.  The  first  and  second  pictures  were 
taken  under  a  usual  light  conditions,  but  at  different  poses  of  the  doll.  The  third  picture  was  taken  under  a  moderate 
greenish  light  and  fourth  pictures  was  taken  under  an  extremely  bright  bluish  light.  The  lower  figures  show  the  respective 
distributions  of  p.  The  identified  positions  are  depicted  by  large  closed  circles  in  the  figures  of  the  upper  row.  The  algorithm 
could  perform  identification  of  the  corresponding  positions  fairly  well  as  we  see  in  the  pictures. 
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